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Summary. 

The marginalization paradox involves a disagreement between two Bayesians who use two 
different procedures for calculating a posterior in the presence of an improper prior. We show 
that the argument used to justify the procedure of one of the Bayesians is inapplicable. There 
is therefore no reason to expect agreement, no paradox, and no evidence that improper priors 
are inherently inconsistent. We show further that the procedure in question can be interpreted 
as the cancellation of infinities in the formal posterior. We suggest that the implicit use of this 
formal procedure is the source of the observed disagreement. 
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1. Introduction. 

An important question in statistics is whether Bayesian inference can be extended to the 
setting of improper priors in a consistent and intuitively viable manner. The use of im- 
proper priors was common throughout much of the twentieth century, and appears to be a 
useful idealization for many applications. In the 1970s, however, two influential arguments 
appeared against the use of improper priors: the "marginalization paradox," and "strong 
inconsistency." These arguments appear to have convinced most statisticians that improper 
priors must be abandoned. 

In this paper we discuss the marginalization paradox, due to iDawid. Stone, and Zidekl 
lll973l) (DSZ73). Let p(x\9) be a normalized sampling distribution with parameter 8 — (77, £) 
and data x — (y,z), and let p(9) be a prior, which may be improper, i.e., of infinite total 
probability. The marginalization paradox concerns the problem of calculating p(£|z), under 
a certain set of assumptions. A first Bayesian, B\, eliminates rj and then y\ a second 
Bayesian, B2, eliminates y and then rj. The details of the procedures are given in DSZ73. 
It is claimed that these procedures rely only on principles that would have to hold in any 
intuitively viable theory of inference. If p(9) is improper, however, B\ and Bi generally get 
incompatible answers. It has been widely inferred that any extension of Bayesian inference 
to the context of improper priors will be inconsistent. 

The purpose of this paper is to show that the marginalization paradox does not imply 
that the use of improper priors will lead to inconsistency. First, we show that the argument 
used to justify B±s elimination of y is invalid, because it is based on the application of 
probabilistic intuitions to a formal quantity whose probabilistic meaning has not been jus- 
tified. The "paradox" is thereby resolved, since we now have no reason to believe that Si's 
answer is correct, and no reason to insist that the answers of B\ and Bi be compatible. 
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Second, we analyze Si's procedure on its own terms, to get a better sense for what 
is being assumed. The posterior p(C\z) is defined as a ratio, which is only formal when 
the prior is improper because there are infinities in the numerator and denominator. Si's 
procedure is equivalent to the assumption that these infinities will cancel. What DSZ73 
have shown, therefore, is that there is no consistent extension of Bayesian inference in which 
the cancellation law, assumed implicitly by B\, holds when the prior is improper. But this 
is only to be expected: it is analogous to the well-known fact that there is no consistent 
extension of arithmetic to the extended real numbers in which the cancellation law holds 
for infinity. The proposal that we abandon improper priors because of the marginalization 
paradox is analogous to the proposal that we abandon the use of infinity because it does 
not obey the laws of arithmetic. 

In brief, the inconsistency of the marginalization paradox is based on an assumption that 
has not been justified intuitively and that is unreasonable mathematically. There is nothing 
in the marginalization paradox to preclude the existence of a formalism that justifies the 
careful use of improper priors. 



2. The intuitive argument. 

In this section we show that the validity of B^s argument has not been established, because 
it is based on an intuitive probabilistic argument, and the distribution to which it is applied 
has not been shown to have a probabilistic meaning. In other words, we show that DSZ73 
have not made their case, because their argument contains a gap. 

In addition to the assumptions described in Section ^ we assume the following: 

(1) The formal posterior, defined as 

fn \ I p(y,z\v,C)p(vX)dr] 
p(C\y, z ) = -n — i > w 7s~7~j7 ; (i) 
J p{y,z\v,Op{v,Q d vdC 

is independent of y. We denote the common value by pi((\z). Note that the value of 
p(C\y, z) and the validity of the assumption itself depend on the prior. 



(2) The marginalized sampling distribution, 

p(z\v,C) = J p(y,z\ViQdy 

is independent of 77. We denote the common value by p2(z\Q. 

(3) For each value of the prior is improper in 77: J p(ij, £) dr\ = 00. 

Assumptions 1 and 2 enable B\ and B2, respectively, to invoke intuitive arguments to 
determine p((\z), even though the formal calculations would lead to infinities. Assumption 3 
is satisfied by all of the examples in DSZ73, and reflects the fact that we are really interested 
in impropriety in ?y. 

We focus on only one aspect of the analysis in DSZ73, because we believe that aspect 
to be the source of all of the difficulties. The aspect in question is Bi's elimination of y, 
which occurs after he has already marginalized over rj. B\ assumes that since p{C,\y 1 z) is 
independent of y, then p(C\z) must be equal to the y-independent value of this function. 
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The justification that DSZ73 give for this assumption is intu itive, and has been for- 
malize d as the "reduction principle," which is stated as follows in lDawid. Stone, and Zidekl 
l)l996|) : "Suppose that a general method of inference, applied to data (y,z), leads to an 
answer that in fact depends on z alone. Then we should obtain the same answer if we 
apply the method to z alone." The principle enables one to determine the answer to the 
problem with data z from the answer to the problem with data (y,z), provided that the 
latter answer depends only on z. We have no objection to this principle as stated. We wish 
to emphasize, however, that in order to apply the principle (or invoke the intuition behind 
the principle), we must first have the "answer" to a problem of inference, given data (y, z). 

The problem with Bi's argument is that p{C\y, z) has not been shown to be the "answer" 
to a problem of inference, so the reduction principle is inapplicable. We show below that in 
the context of the marginalization paradox, any sampling distribution p(y,z\£) associated 
with p{C\y,z) is necessarily improper, so that it has no inherent probabilistic meaning. 
There is no reason to assume that the associated formal posterior will have any probabilistic 
meaning, even if that posterior is proper. In the absence of such a meaning, p{C\y, z) is 
not the answer to a problem of inference, Bi is unable to use the reduction principle to 
complete his argument, and the inconsistency vanishes. 

We are not claiming that it is impossible to provide a meaning for an improper distribu- 
tion. Indeed, such an assumption would preclude the use of improper priors and prejudge 
the whole issue. We are merely observing that in order to use the reduction principle, a 
probabilistic meaning must be provided for p{y,z\C t ), and this has not been done. Even if 
a meaning is provided, any manipulations of the distribution must be justified in terms of 
that meaning, and there is no guarantee that the resulting procedures will be the formal 
analogs of valid procedures for proper distributions. 

We now establish the impropriety of the sampling distribution. 
Proposition: Let p(i],Q be given, and let p(j), C) = p(v\()p(0 be any factorization of 
p(r), C) such that < p(() < oo. Under the above assumptions we have, for each £, 



p(y,z\C)dy = oo. (2) 



Proof: 



P(y,z\0 d y = J ' p(y,z\v,Op(vX)dr)dy = ^^y~ J P(vX) drj = oo. 

The interchange in the order of integration is justified by Tonelli's theorem. □ 

An immediate corollary is that J p{y, z|£) dy dz = oo. The factorization of p(j], Q is 
nonunique, and this implies a nonuniqueness in the definition of p(y, z|C)- The proposition 
shows, however, that impropriety of the conditional distribution is independent of the choice 
of factorization. Note also that although we are evaluating B\ 's argument, the proof depends 
on assumption (2), which was made for iVs benefit. 



3. The formal argument. 

We now consider B\ 's procedure on its own terms, as a formal procedure. We find that in the 
case of a proper prior, Si's use of the reduction principle is equivalent to the cancellation of a 
finite factor in a ratio defining p(C\z) , and in the case of an improper prior, to the cancellation 
of an infinite factor. It is well-known that the formal cancellation of infinities will generally 
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lead to inconsistencies. We conclude that when viewed formally, Si's procedure is highly 
suspect. 

In general, the posteriors of £ given (y, z) and given z are given formally by the following 
expressions: 

P((\y,z) = J—, tt-tt, and (3) 

= Jp(y,z,Qdy 
I P(y,z,()dyd( 

Under Assumption 1, p(C\y, z) is independent of y. Then 

p(y,z,() = p(y,z)pi((\z), (5) 

where p(y 7 z) — J p(y,z,()d(. Substituting Eq. iJHJ into Eq. (J2J, we obtain 

p(C|«) = j^iMl^i^lfl^ ( 6 ) 

f p{y,z)p 1 (C\z)dydC 

When J p{y,z)dy is finite, thenp(£|z) =Pi(C|^). 

If we also make Assumptions 2 and 3, the proposition implies that J p(y, z) dy = oo. 
The assumption that p(C\z) — p\{C\z) is now equivalent, as claimed, to the assumption that 
it is permissible to cancel infinite factors of J p{y, z) dy from the ratio defining p((^\z). 



4. Discussion. 

We have observed that the inconsistencies uncovered in DSZ73 depend on formal manip- 
ulation on the part of B\. We have shown, in Sections [3 and respectively, that Si's 
procedure has not been justified intuitively, and is suspect mathematically. We therefore 
see no reason to accept Si's reasoning, or to regard the validity of this reasoning as nec- 
essary or desirable in any extension of Bayesian inference to improper priors. Once Si's 
reasoning is rejected, the marginalization paradox disappears. 

The core of our argument is the observation that Si's argument is formal because the 
sampling distribution p(y, z\Q is improper. To the best of our knowledge, this observation 
has not been made previously. The impropriety of the sampling distribution has perhaps 
been obscured by its nonuniqueness and by the fact that the formal posterior can be calcu- 
lated from Eq. Q without ever computing the sampling distribution explicitly. 

Previous analyses of the marginalization paradox generally accepted the validity of both 
Bayesians' arguments. The problem then becomes one of understanding when and why the 
two Bayesians will agree. This analysis was initiated in DSZ73, which is mostly dedicated 
to this question. It turns out that for problems amenable to group analysis, consistency 
may be achieved by a uniquely determined prior. The priors determined by this constraint, 
however, are unsatisfactory for a variety of reasons, which DSZ73 explore in detail. They 
conclude that an acceptable theory is elusive or unachievable. 

The most persistent and insightful critic of the margin al ization paradox has been the 
late E . T. Javnes. C f. IJavned l|l980ajk iDawid et al.1 l|l980|) : IJavned (|l980VJ) : iDawid et al.1 
l|l99fMTavn'ell|2n03h . for his extended debate with the authors of DSZ73. We believe that 
at the conceptual level, Jaynes' critique was fundamentally correct, in that he identified 
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the source of the inconsistencies as the formal manipula tion of compl eted infinities. A 
particularly elegant statement of this view can be found in ljavnesl l)2003|) . At the technical 
level, Jaynes did not recognize that Bi's argument was invalid, so he was forced to try to 
determine how the two Bayesians could be reconciled. His thesis was that the disagreement 
between the Bayesians reflected differences in their prior information. In our opinion, this 
analysis was not entirely successful, and the correct approach is to reject B\S reasoning. 

For general background on the marg inalization paradox and rela ted issues, we refer the 
reader to the excellent review article of iKass and Wassermanl l|l996|) . 
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